Audio to Score Matching by Combining Phonetic and Duration Information
نویسندگان
چکیده
We approach the singing phrase audio to score matching problem by using phonetic and duration information – with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic contour for each mode in jingju music, only using melodic information (such as pitch contour) will result in an ambiguous matching. This leads us to propose a matching approach based on the use of phonetic and duration information. Phonetic information is extracted with an acoustic model shaped with our data, and duration information is considered with the Hidden Markov Models (HMMs) variants we investigate. We build a model for each lyric path in our scores and we achieve the matching by ranking the posterior probabilities of the decoded most likely state sequences. Three acoustic models are investigated: (i) convolutional neural networks (CNNs), (ii) deep neural networks (DNNs) and (iii) Gaussian mixture models (GMMs). Also, two duration models are compared: (i) hidden semi-Markov model (HSMM) and (ii) post-processor duration model. Results show that CNNs perform better in our (small) audio dataset and also that HSMM outperforms the post-processor duration model.
منابع مشابه
Optimizing image steganography by combining the GA and ICA
In this study, a novel approach which uses combination of steganography and cryptography for hiding information into digital images as host media is proposed. In the process, secret data is first encrypted using the mono-alphabetic substitution cipher method and then the encrypted secret data is embedded inside an image using an algorithm which combines the random patterns based on Space Fillin...
متن کاملConsequences of Traffic Accidents Transferred by Helicopter and Ground Ambulance: Propensity Score Matching
Introduction: The main task of the emergency medical system is to provide primary care and transfer the injured to a medical center. Studies have been conducted to investigate the outcome of helicopter and ground ambulance casualties, but they show different results. These different results may be due to the type of study, statistical methods, differences in pre-hospital emergency systems, an...
متن کاملUsing Natural Language Input and Audio Analysis for a Human-Oriented MIR System
In this paper we will present a MIR (Music Information Retrieval) system using natural language as input for human-oriented queries to large-scale music collections, applicable in web databases, cd-chargers for cars, or mobile services. The outlined system is a full-fledged architecture combining state-of-the-art approaches from the fields of natural language understanding including phonetic ma...
متن کاملProduction of English Lexical Stress by Persian EFL Learners
This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...
متن کاملA Lyrics-matching Qbh System for Inter- Active Environments
Query-by-Humming (QBH) is an increasingly prominent technology that allows users to browse through a song database by singing/humming a part of the song they wish to retrieve. Besides these cases, QBH can also be used to track the performance of a user in applications such as Score Alignment and Real-Time Accompaniment. In this paper we present an online QBH algorithm for audio recordings of si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017